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HRTF Measurements of a KEMAR Dummy- 
Head Microphone 



Bill Gardner and Keith M artin 
MI T Media Lab 

Abstract: 

An extensive set of head-related transfer function (HRTF) measurements of a KEMAR dummy head 
microphone was completed in May, 1 994. The measurements consist of the left and right ear impulse 
responses from a Realistic Optimus Pro 7 loudspeaker mounted 1 .4 meters from the KEMAR. 
Maximum length (ML) pseudo-random binary sequences were used to obtain the impulse responses at 
a sampling rate of 44.1 kHz. A total of 710 different positions were sampled at elevations from -40 
degrees to +90 degrees. Also measured were the impulse response of the speaker in free field and 
several headphones placed on the KEMAR. This data is being made available to the research 
community on the Internet via anonymous FTP and the World Wide Web. 



X48K) 

Keith, KEMAR, and Bill in MIT's anechoic chamber. 



The ftp and html archive is maintained by Bill Gardner, billg@media.mit.edu and Keith Martin, 
kdm@media.mitedu. It contains a set of head-related transfer function (HRTF) measurements of a 
KEMAR dummy-head microphone and related software. 

The archive contains: 

README (IK) - A brief description of the archive 

KE MAR-FAO.txt (25K) - Frequently Asked Questions - read this 

c ompact.tar.Z (200K) - Compact HRTF measurements 

compact.zip (226TO - Compact HRTF measurements, Windows zip file 

fu ihtar.Z (1.3M) - Complete set of HRTF measurements, speaker and headphone responses 

full- zip (1.4M) - Complete set of HRTF measurements, Windows zip file 

headphones + spkr. zip (4 IK) - Speaker and headphone responses, Windows zip file 

hrtfdoc .ps (11 6K) - Postscript documentation on the HRTF data 



http://sound.media.mit.edu/KEMAR.html 8/8/2000 
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hitfd oc.txt (18K) - Text-only version ofhrtfdoc.ps 

di ffuse . ta r.Z H97K) - diffuse-field equalized HRTFs (44. 1 kHz) 

difiu se.zip T221K ) - diffuse-field equalized HRTFs (44.1 kHz), Windows zip file 

diffu sc3 2k. tar.Z (157IO - above resampled to 32kHz sampling rate 

3 Paudio.tar.Z (203K) - demonstration 3D audio spatializer for SGI Indigos 

matlab scr ipt s.tanZ (1 IIP - useful MATLAB scripts for processing data 

The file hrtfdoc contains a detailed description of the data and the measurement technique. It is also 
available online as an html document ( hrtfdoc.htmn . It may also be obtained through the Media Lab 
as Perceptual Computing Technical Report #280. Note that the diffuse-field equalized HRTFs, 3D 
audio program, MATLAB scripts, and Windows zip format files were added to this archive after the 
documentation was written. 



The Windows zip fonnat files were generously provided by Robert Cain. 

This data is Copyright 1994 by the MIT Media Laboratory. It is provided free with no restrictions on 
use, provided the authors are cited when the data is used in any research or commercial application. 

Bill Gardner hillg@media.mit.edu and Keith Martin kdm@media.mit.edu 

MIT Media Lab Machine Listening Group 



May 1 8, 1 994 (last revised July 18, 2000) 



h ttp ://sound .medi a.mit. edu/KEM AR.htm 1 
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HRTF Measurements of a KEMAR Dummy -He ad Microphone 

MIT Media Lab Perceptual Computing - Technical Report #280 

Bill Gardner and Keith Martin 
MIT Media Lab 
May, 1994 

abstract 

An extensive set of head-related transfer function (In this document, 
we use the acronym HRTF to refer to head related impulse responses. 
The impulse response and transfer function are related in the obvious 
way by the Fourier transform.) measurements of a KEMAR dummy head 
microphone has recently been completed. The measurements consist of 
the left and right ear impulse responses from a Realistic Optimus Pro 
7 loudspeaker mounted 1.4 meters from the KEMAR. Maximum length (ML) 
pseudo-random binary sequences were used to obtain the impulse 
responses at a sampling rate of 44.1 kHz . In total, 710 different 
positions were sampled at elevations from -40 degrees to +90 degrees. 
Also measured were the impulse response of the speaker in free field 
and several headphones placed on the KEMAR. This data is being made 
available to the research community on the Internet via anonymous FTP 
and the World Wide Web. 

Measurement technique 

Measurements were made using a Macintosh Ouadra computer equipped with 
an Audiomedia II DSP card, which has 16 -bit stereo A/D and D/A 
converters that operate at a 44.1 kHz sampling rate. One of the audio 
output channels was sent to an amplifier which drove a Realistic 
Optimus Pro 7 loudspeaker. This is a small two way loudspeaker with a 
4 inch woofer and 1 inch tweeter. The KEMAR, Knowles Electronics 
model DB-4004, was equipped with model DB-061 left pinna, model DB-065 
(large red) right pinna, Etymotic ER-11 microphones, and Etymotic 
ER-11 preamplifiers. The outputs of the microphone preamplifiers were 
connected to the stereo inputs of the Audiomedia card. 

From the standpoint of the Audiomedia card, a signal sent to the audio 
outputs results in a corresponding signal appearing at the audio 
inputs. Measuring the impulse response of this system yields the 
impulse response of the combined system consisting of the Audiomedia 
D/A and A/D converters and anti-alias filters, the amplifier, the 
speaker, the room in which the measurements are made, and most 
importantly, the response of the KEMAR with its associated microphones 
and preamps. We can avoid interference due to room reflections by 
ensuring that any reflections occur well after the head response time, 
which is several milliseconds. We can compensate for a non-uniform 
speaker response by measuring the speaker response separately and 
creating an inverse filter. The inverse filter, when applied to an 
HRTF measurement, equalizes the speaker response to be flat , 

The impulse responses were obtained using ML sequences (for a detailed 
description of the ML sequence measurement technique, see [2]). The 
sequence length was N » 16383 samples, corresponding to a 14-bit 
generating register. Two copies of the sequence were concatenated to 
form a 2*N sample sound which was played from the Audiomedia card. 
Simultaneously, 2 *sr samples were recorded on both the left and right 
input channels (we wrote software for the Audiomedia to simultaneously 
play and record stereo sounds) . For each input channel, the following 

http://sound.media.mit.ed 8/8/2000 
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technique was used to recover the impulse response. The first N 
samples of the result were discarded, and the remaining N samples were 
duplicated to form a 2*N sample seguence. This was cross-correlated 
with the original N sample ML sequence using FFT based block 
convolution, forming a 3*M - l sample result. The M sample impulse 
response was extracted starting at N - 1 samples into this result. 

Noise in the ML sequence impulse responses can be attributed to 
measurement noise, non-linearities in the system, and time aliasing. 
Measurement noise can be averaged out by using longer ML sequences. 
This is completely analagous to averaging smaller length measurements. 
For instance, averaging two independent N point impulse response 
measurements should achieve a 3 dB signal to noise ratio (SNR) 
improvement over either of the measurements considered alone. 
Similarly, using a 2*N(+1) point ML sequence should achieve a 3 dB SNR 
improvement over using an M point ML sequence. However, noise caused 
by non-linearities in the system will not be reduced by repeated 
averaging over ML sequence measurements, because the noise is 
correlated between measurements. It is necessary either to use longer 
ML sequences or to average the reponses resulting from different ML 
sequences (i.e. from different masks) to reduce noise caused by 
non-linearities (see [3] ) . Time aliasing can be eliminated by using 
ML sequences which are longer than the reverberation time of the 
measurement space. Since the measurements were done in an anechoic 
chamber and the ML sequences were sufficiently long, time aliasing was 
not a problem. we chose 16383 point measurements to give good signal 
to noise ratios without excessive storage requirements or computation 
time. The measured SNR was 65 dB, as discussed later. 

Measurement procedure 

The measurements were made in MIT's anechoic chamber. The KEMAR was 
mounted upright on a motorized turntable which could be rotated 
accurately to any azimuth under computer control. The speaker was 
mounted on a boom stand which enabled accurate positioning of the 
speaker to any elevation with respect to the KEMAR . Thus, the 
measurements were made one elevation at a time, by setting the speaker 
to the proper elevation and then rotating the KEMAR to each azimuth. 
With the KEMAR facing forward toward the speaker (0 degrees azimuth) , 
the speaker was positioned such that a normal ray projected from the 
center of the face of the speaker bisected the interaural axis of the 
KEMAR at a distance of 1.4 meters. This was accomplished using a tape 
measure, plumb line, calculator, a 1.4 meter rod, and a fair amount of 
eyeballing. We believe the speaker was always within 0.5 inch of the 
desired position, which corresponds to an angular error of +/- 0.5 
degrees . 

The spherical space around the KEMAR was sampled at elevations from 
-40 degrees (40 degrees below the horizontal plane) to +90 degrees 
(directly overhead) . At each elevation, a full 360 degrees of azimuth 
was sampled in equal sized increments. The increment sizes were 
chosen to maintain approximately 5 degree great-circle increments. 
The table below shows the number of samples and azimuth increment at 
each elevation (all .angles in degrees). A total of 710 locations were 
sampled . 

Elevation Number of Azimuth 
Measurements Increment 
-40 56 6.43 



http://sound.media.mit.edu/KEMAR/hrtfdoc.txt 8/8/2000 
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-30 
-20 
-10 
0 
10 
20 
30 
40 
50 
60 
70 
80 
90 



60 
72 
72 
72 
72 
72 
60 
56 
45 
36 
24 
12 
1 



6.00 
5.00 
5 . 00 
5.00 
5 .00 
5.00 
6.00 
6.43 
8.00 
10.00 
15 .00 
30.00 



x.xx 



Table 1: Number of measurements and azimuth 
increment at each elevation 



If the KEMAR was perfectly symmetrical and its ear microphones were 
identical, we would only need to sample either the left or right 
hemisphere around the KEMAR. However, our KEMAR had two different 
pinnae (the left pinna was "normal" , the right pinna was the ""large 
red' ' model), and consequently the responses were not identical. This 
was actually a bonus, because by sampling the entire sphere we 
obtained two complete sets of symmetrical RRTFs. 

Speaker and headphone measurements 

The impulse response of the Optimus Pro 7 speaker was measured in the 
anechoic chamber using a Neumann KMi 84 microphone at a distance of 
1.4 meters. The measurement technique was exactly the same as the 
HRTF measurements. The speaker impulse response can be used to create 
an inverse filter to equalize the HRTF measurements, as will be 
discussed later. 

In addition to measuring the speaker response, we also measured a 
variety of headphones placed on the KEMAR. The headphones measured 
are listed in Table 2. 



It is possible the HRTF data will be used to create a spatial auditory 
display, in which case the frequency response of the headphones used 
to render the display is important. The above headphone responses may 
be useful to create appropriate inverse filters. We did not gather 
data on the repeatabl itity of such measurements (i.e. how much 
variation in the frequency response is expected each time the 
headphones are placed on the head) . 



As described earlier, each HRTF measurement yielded a 16383 point 
impulse response at a 44.1 kHz sampling rate. Most of this data is 
irrelevant. The 1.4 meter air travel corresponds to approximately 180 
samples, and there is an additional delay of 50 samples inherent in 



AKG K240 



Circumaural, closed earcups, but 

not well isolated. 
Supraaural, open air. 
Supraaural, walkman style. 
Intraaural , earplug style. 



Sennheiser HD4 80 
Radio Shack Nova 3 8 
Sony Twin Turbo 



Table 2 : Description of headphones measured 



The data 



http://sound.media.mit.edu/KEMAR/hrtfdoc.txt 
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the playback/recording system. Consequently, in each impulse 
response, there is a delay of approximately 23 0 samples before the 
head response occurs. The head response persists for several hundred 
samples (subject to interpretation) and is followed by various 
reflections off objects in the anechoic chamber (such as the KEMAR 
turntable) . In order to reduce the size of the data set without 
eliminating anything of potential interest, we decided to discard the 
first 200 samples of each impulse response and save the next 512 
samples. Each HRTF response is thus 512 samples long. Most 
researchers will no doubt truncate this data further. 

The impulse responses are stored as 16 -bit signed integers, with the 
most significant byte stored in the low address (i.e. Motorola 68000 
format) . The dynamic range of the 16 -bit integers (96 dB> exceeds the 
signal to noise ratio of the measurements, which we conservatively 
measured to be 65 dB. Using the 0 degree elevation, 0 degree azimuth, 
left ear, 16383 point measurement, we compared the energy in 100 
samples centered on the head response to the first 100 samples of the 
response (these should ideally be zero) which yielded the 65 dB SNR. 

The HRTF data is stored in directories by elevation. Each directory 
name has the format ^elevEE' », where EE is the elevation angle. 
Within each directory each filename has the format * "XEEeAAAa . dat 1 * 
where X is either ^L»' or * "R 1 • for left and right ear response, 
respectively, EE is the elevation angle of the source in degrees, from 
-40 to 90, and AAA is the azimuth of the source in degrees, from 0 to 
355. Elevation and a2imuth angles indicate the location of the source 
relative to the KEMAR, such that elevation 0 azimuth 0 is directly in 
front of the KEMAR, elevation 90 is directly above the KEMAR , 
elevation 0 azimuth 90 is directly to the right of the KEMAR, etc. 
For example, the file ~ ^R-20e270a . dat 1 • is the right ear response, 
with the source 20 degrees below the horizontal plane and 90 degrees 
to the left of the head. Note that three digits are always given for 
azimuth so that the files appear in sorted order in each directory. 

To select a pair of HRTF responses, we recommend using symmetrical 
responses obtained from one of the KEMAR ears. For instance, for the 
HRTF responses for a source 4 5 degrees to the right of the head at 0 
degrees elevation, use "L0e04 5a.dat' ' for the left ear and 
^L0e315a.dat ' ' for the right ear, or use " "R0e3 15a . dat » ' for the left 
ear and " ^R0e045a . dat ' • for the right ear. Note that this approach 
eliminates binaural localization cues in the median plane. 

The maximum sample value in the left ear HRTF data is -26793 in file 
"L4Oe209a. dat * * . In the right ear HRTF data the maximum value is 
29877 in the file * ~R40e039a . dat ' • . 

The speaker impulse response and headphone impulse responses are 
stored in the directory " "headphones+spkr • 1 . An inverse filter for 
the Optimus Pro 7 speaker is included. The inverse filter was 
designed by zero-padding the measured impulse response and taking the 
DFT of the zero-padded sequence. The resulting complex spectrum was 
inverted by negating the phase and inverting the magnitude. This was 
done over the range from DC to 18 kHz; beyond 18 kHz the inverse 
spectrum was made flat by repeating the 18 kHz magnitude value. The 
inverse filter was obtained by computing the inverse DFT of this 
spectrum. A minimum phase version of this inverse filter was also 
computed using the real cepstrum (see [1J ) . The files in the 
""headphones+spkr 1 ■ directory are listed in Table 3. 



http://sound.media.mit.edu/K^ 8/8/2000 
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filename 



description 
Optimus Pro 7 impulse response 
Inverse filter for Optimus Pro 7 
Minimum phase inverse filter 
AKG headphone impulse response 



Optimus . dat 
Opti_ inverse . dat 
Opti_minphase . dat 
AKG-K24 0-L.dat 
AKG-K240-R.dat 
Senn-HD4 80 -L . dat 
Senn-HD480-R. dat 
RS~Nova38-L.dat 
RS-Nova38-R. dat 
Sony - Twin Turbo - L . dat 
Sony-TwinTurbo-R . dat 



Sony headphone impulse response 



Radio Shack headphone impulse response 



Sennheiser headphone impulse response 



Table 3: Contents of " "headphones -t-spkr 



directory 



The 512 point impulse responses and speaker and headphone data may be 
found in the tar archive " full . tar . Z 1 ' . 

Compact data files 

For those interested purely in 3-D audio synthesis, we have included a 
data-reduced set of 12 8 point symmetrical HRTFs derived from the left 
ear KEMAR responses. These have also been equalized to compensate for 
the non-uniform response of the Optimus Pro 7 speaker. The 128 point 
responses may be found in the tar archive " "compact . tar . Z 1 The 
data -reduced impulse responses are stored in directories by elevation 
as described above. within each directory each filename has the 
format "HESeAAAa.dat 1 1 where EE is the elevation angle of the source 
in degrees , and AAA is the azimuth angle of the source in degrees . 

Each file contains a stereo pair of 128 point impulse responses 
corresponding to the left and right ear responses for the given source 
position. For instance, the file ' ~HOe090a . dat ' r contains the left 
and right ear impulse responses for a source directly to the right of 
the listener. The left response was derived from the 512 point file 
** "L0e090a.dat * ' and the right response was derived from the 512 point 
file ""L0e2 70a.dat 1 ' . The data is stored as 16-bit integers and the 
stereo samples are stored in (left, right) interleaved order. Each 
12 8 point response was obtained by convolving the appropriate 512 
point impulse responses with the minimum phase inverse filter for the 
Optimus Pro 7 speaker. The resulting impulse responses were then 
cropped by retaining 128 samples starting at sample index 26, The 
maximum sample value in the 128 point data is 304 96 in the file 
""H-10elOOa.dat • ' . 

Accessing the data on the Internet 

The data is organized into two tar archives, this document (postscript 
and plain text) and a text README file. The structure of the tar 
archives is described in the previous sections. 

To retrieve the HRTF data by anonymous FTP, your FTP session would 
look something like the following: 

kdm@eno : - > ftp sound.media.mit.edu 
Connected to sound . media . mit , edu . 

220 sound.media.mit.edu FTP server (ULTRIX Version 4.1 Tue Mar 19 00:38:17 EST 1991 
Name (sound . media . mi t . edu : kdm) : anonymous 
331 Guest login ok, send ident as password. 



http://sound.media.mit. edu/TCEMAR/hrtfdoc.txt 



8/8/2000 
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Password: {Type your User ID here} 

2 30 Guest login ok, access restrictions apply. 

ftp> cd pub 

2 50 CWD command successful. 
ftp> cd Data 

2 50 CWD command successful. 
ftp> cd KEMAR 

250 CWD command successful. 
ftp> Is 

2 00 PORT command successful. 

150 Opening data connection for /bin/Is (18.85.0.105,3975) (0 bytes). 
README 

compact . tar . Z 
full .tar. Z 
hrtf doc . ps 
hrtf doc . txt 

226 Transfer complete. 

60 bytes received in 0.42 seconds (0.14 Kbytes/s) 

ftp> binary 

200 Type set to I. 

ftp> get README 

200 PORT command successful. 

150 Opening data connection for README (18.85.0.105,3806) (417 bytes). 
2 26 Transfer complete, 
local: README remote: README 

952 bytes received in 0.043 seconds (22 Kbytes/s) 



Please note that there are no files shared between the two tar archive 
files. To expand the tar archives, use: 



This will create the directories "full 1 ' and '"compact' ■ . 

To retrieve the HRTF data via the WWW, use your browser to open the 
following URL : 



Simply follow the directions found in the html document. 
Usage restrictions 

This HRTF data is Copyright 1994 by the MIT Media Lab. It is provided 
without any usage restrictions. We request that you cite the authors 
when using this data for research or commercial applications. 

Correspondence 

All correspondence regarding this data should be directed to: 



etc . 



kdm®eno:~ > uncompress full. tar. Z 
Jcdm@eno:~ > tar xvf full. tar 
kdm@eno : - > uncompress compact . tar . 2 
kdm@eno : - > tar xvf compact. tar 



ht tp : / / sound . rnedi a . mi t . edu/ KEMAR . html 



Keith Martin 
MIT Media Lab, E15-401D 
20 Ames Street 
Cambridge, MA 02139 



or 



Bill Gardner 
MIT Media Lab, Ei5~40iB 
20 Ames Street 
Cambridge, MA 02139 



http://sound.media.mit.edii/KEMAR/hrtfdoc.txt 
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abstract 

An extensive set of head-related transfer function (In this document, 
we use the acronym HRTF to refer to head related impulse responses. 
The impulse response and transfer function are related in the obvious 
way by the Fourier transform.) measurements of a KEMAR dummy head 
microphone has recently been completed. The measurements consist of 
the left and right ear impulse responses, from a Realistic Optimus Pro 
7 loudspeaker mounted 1.4 meters from the KEMAR. Maximum length (ML) 
poeudo- random binary sequences were used to obtain the impulse 
responses at a sampling rate of 44.1 kHz. In total, 710 different 
positions were sampled at elevations from -40 degrees to +90 degrees. 
Also measured were the impulse response of the speaker in free field 
and several headphones placed on the KEMAR. This data is being made 
available to the research community on the Internet via anonymous FTP 
and the World Wide Web. 

Measurement technique 

Measurements were made using a Macintosh Quadra computer equipped with 
an Audiomedia II DSP card, which has 16 -bit stereo A/D and D/A 
converters that operate at a 44.1 kHz sampling rate. One of the audio 
output channels was sent to an amplifier which drove a Realistic 
Optimus Pro 7 loudspeaker. This is a small two way loudspeaker with a 
4 inch woofer and 1 inch tweeter. The KEMAR, Knowles Electronics 
model DB-4004, was equipped with model DD-061 left pinna,* model DB-065 
(large red) right pinna, Etymotic ER-11 microphones, and Etymotic 
ER-11 preamplifiers. The outputs of the microphone preamplifiers were 
connected to the stereo inputs of the Audiomedia card. 

From the standpoint of the Audiomedia card, a signal sent to the audio 
outputs resultB in a corresponding signal appearing at the audio 
inputs. Measuring the impulse response of this system yields the 
impulse response of the combined system consisting of the Audiomedia 
D/A and A/D converters and anti-alias filters, the amplifier, the 
speaker, tlie room in which the measurements are made, and most 
importantly, the response of the KEMAR with its associated microphones 
and preamps. We can avoid interference due to room reflections by 
ensuring that any reflections occur well after the head response time, 
which is several milliseconds. We can compensate for a non-uniform 
speaker response by measuring the speaker response separately and 
creating an inverse filter. The inverse filter, when applied to an 
HRTF measurement, equalizes the speaker response to be flat. 

The impulse responses were obtained using ML sequences (for a detailed 
description of the ML sequence measurement technique, see C2] ) . The 
sequence length was N = 16383 samples, corresponding to a 14 -bit 
generating register. Two copies of the sequence were concatenated to 
form a 2*N sample sound which was played from the Audiomedia card. 
Simultaneously, 2*N samples were recorded on both the left and right 
input channels (we wrote software for the Audiomedia to simultaneously 
play and record stereo sounds). For each input channel, the following 
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technique was used to recover the impulse response. The first N 
samples of the result were discarded, and the remaining N samples were 
duplicated to form a 2*N sample sequence. This was cross -correlated 
with the original N sample ML sequence using FFT based block 
convolution, forming a 3*N - 1 sample result. The N sample impulse 
response was extracted starting at N - l samples into this result. 

NoiGe in the Ml* sequence impulse responses can be attributed to 
measurement noise, non-linearities in the system, and time aliasing. 
Measurement noise can be averaged out by using longer ML sequences. 
This is completely analagous to averaging smaller length measurements. 
For instance, averaging two independent N point impulse response 
measurements should achieve a 3 dB signal to noise ratio (SNR) 
improvement over either of the measurements considered alone. 
Similarly, using a 2*N(+1) point ML sequence should achieve a 3 dB SNR 
improvement over using an N point ML sequence. However, noise caused 
by non-linearities in the system will not be reduced by repeated 
averaging over ML sequence measurements, because the noise is 
correlated between measurements. It is necessary either to use longer 
ML sequences or to average the reponses resulting from different ML 
sequences (i.e. from different masks) to reduce noise caused by 
non-linearities (see [3] > . Time aliasing can be eliminated by using 
ML sequences which are longer than the reverberation time of the 
measurement space. Since the measurements were done in an anechoic 
chamber and the ML sequences were sufficiently long, time aliasing was 
not a problem. We chose 163 83 point measurements to give good signal 
to noise ratios without excessive storage requirements or computation 
time. The measured SNR was 65 dB, as discussed later. 

Measurement procedure 

The measurements were made in MIT's anechoic chamber. The KEMAR was 
mounted upright on a motorized turntable which could be rotated 
accurately to any azimuth under computer control. The speaker was 
mounted on a boom stand which enabled accurate positioning of the 
speaker to any elevation with respect to the KEMAR. Thus, the 
measurements were made one elevation at a time, by setting the speaker 
to the proper elevation and then rotating the KEMAR to each azimuth. 
With the KEMAR facing forward toward the speaker (0 degrees azimuth), 
the speaker was positioned such that a normal ray projected from the' 
center of the face of the speaker bisected the interaural axis of the 
KEMAR at a distance of 1.4 meters. This was accomplished using a tape 
measure, plumb line, calculator, a 1.4 meter rod, and a fair amount of 
eyeballing. We believe the speaker was always within 0.5 inch of the 
desired position, which corresponds to an angular error of +/- 0.5 
degrees. 

The spherical space around the KEMAR was sampled at elevations from 
-40 degrees (40 degrees below the horizontal plane) to +90 degrees 
(directly overhead) . At each elevation, a full 360 degrees of azimuth 
was sampled in equal si2ed increments. The increment sizes were 
chosen to maintain approximately 5 degree great -circle increments. ' 
The table below shows the number of samples and azimuth increment at 
each elevation (all angles in degrees) . A total of 710 locations were 
sampled . 

Elevation Number of Azimuth 
Measurements Increment 
-40 56 6.43 
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Table 1 : Number of measurements and azimuth 
increment at each elevation 

If the KEMAR was perfectly symmetrical and its ear microphones were 
xdentical, we would only need to sample either the left or right 
hemisphere around the KEMAR. However, our KEMAR had two different 
pinnae (the left pinna was "normal", the right pinna was the ^ large 
red'* model), and consequently the responses were not identical. This 
was actually a bonus, because by sampling the entire sphere we 
obtained two complete oets of symmetrical HRTFs . 

Speaker and headphone measurements 

The impulse response of the Optimus Pro 7 speaker was measured in the 
anechoic chamber using a Neumann KMi 84 microphone at a distance of 
1.4 meters. The measurement technique was exactly the same as the 
HRTF measurements. The speaker impulse response can be used to create 
an inverse filter to equalize the HRTF measurements, as will be 
discussed later. 



In addition to measuring the speaker response, we also measured 
variety of headphones placed on the KEMAR. The headphones measured 



a 



are listed in Table 2. 



AKG K24 0 Circumaural, closed earcups, but 

not well isolated. 
Sennheiser HD480 Supraaural, open air. 

Radio Shack Nova 38 Supraaural, walkman style. 
Sony Twin Turbo Intraaural, earplug style. 

Table 2: Description of headphones measured 

It is possible the HRTF data will be used to create a spatial auditory 
display, m which case the frequency response of the headphones used 
to render the display is important. The above headphone responses may 
be useful co create appropriate inverse filters. We did not gather 
data on the repeatablitity of such measurements (i.e. how much 
variation in the frequency response is expected each time the 
headphones arc placed on the head) . 

The data 

As described earlier, each HRTF measurement yielded a 16383 point 
impulse response at a 44.1 kHz sampling rate. Most of this data is 
irrelevant. The 1.4 meter air travel corresponds to approximately 180 
samples, and there is an additional delay of 50 samples inherent in 
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the playback/recording system. Consequently, in each impulse 
response, there is a delay of approximately 230 samples before the 
head response occurs. The head response persists for several hundred 
samples (subject to interpretation) and is followed by various 
reflections off objects in the anechoic chamber (such as the KEMAR 
turntable) . In order to reduce the size of the data set without 
eliminating anything of potential interest, we decided to discard the 
first 200 samples of each impulse response and save the next 512 
samples. Each HRTF response is thus 512 samples long. Most 
researchers will no doubt truncate this data further. 

The impulse responses are stored as 16-bit signed integers, with the 
most significant byte stored in the low address (i.e. Motorola 68000 
format) . The dynamic range of the 16 -bit integers (96 dB) exceeds the 
signal to noise ratio of the measurements, which we conservatively 
measured to be 65 dB . Using the 0 degree elevation, 0 degree azimuth, 
left ear, 163 83 point measurement, we compared the energy in 100 
samples centered on the head response to the first 100 samples of the 
response (these should Ideally be zero} which yielded the 65 dB SNR, 

The HRTF data is stored in directories by elevation. Each directory 
name has the format "elevEE 1 ', where EE is the elevation angle, 
within each directory each filename has the format "XEEeAAAa. dat 1 ' 
where X is either "L • ■ or * * R 1 • for left and right ear response, 
respectively, EE is the elevation angle of the source in degrees, from 
-4 0 to 90, and AAA is the azimuth of the source in degrees, from 0 to 
355. Elevation and azimuth angles indicate the location of the source 
relative to the KEMAR, such that elevation 0 azimuth 0 is directly in 
front of the KEMAR, elevation 9 0 is directly above the KEMAR, 
elevation 0 azimuth 90 is directly to the right of the KEMAJR, etc. 
For example, the file *"*R - 20e270a . dat ' • is the right ear response, 
with the source 2 0 degrees below the horizontal plane and 90 degrees 
to the left of the head. Note that three digits are always given for 
azimuth so that the files appear in sorted order in each directory. 

To select a pair of HRTF responses, we recommend using symmetrical 
responses obtained from one of the KEMAR ears. For instance, for the 
HRTF responses for a source 45 degrees to the right of the head at 0 
degrees elevation, use * "L0e045a . dat • * for the left ear and 

"L0e3 15a . dat • ■ for the right ear, or use "R0e3 15a . dat ' * for the left 
ear and " "R0e045a . dat * • for the right ear. Note that this approach 
eliminates binaural localization cues in the median plane. 

The maximum sample value in the left ear HRTF data is -26793 in file 
"L40e289a.dat**. In the right ear HRTF data the maximum value is 
29677 in the file "~R40e039a.dat''. 

The speaker impulse response and headphone impulse responses are 
stored in the directory *" ^headphones+spkr r ' . An inverse filter for 
the Optimus Pro 7 speaker is included. The inverse filter was 
designed by zero-padding the measured impulse response and taking the 
DFT of the zero-padded sequence. The resulting complex spectrum was 
inverted by negating the phase and inverting the magnitude. This was 
done over the range from DC to 18 kHz; beyond 18 kHz the inverse 
spectrum was made flat by repeating the 18 kHz magnitude value. The 
inverse filter was obtained by computing the inverse DFT of this 
spectrum. A minimum phase version of this inverse filter was also 
computed using the real cepstrum (see [1]). The files in the 
* fc headphones+spkr ' ' directory are listed in Table 3. 



i 
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filename 



description 
Optimus Pro 7 impulse response 
Inverse filter for Optimus Pro 7 
Minimum phase inverse filter 
AKG headphone impulse response 



Optimus . dat 
Opti_inverse .dat 
Opti_minphase .dat 
AKG-K240-L.dat 
AKG-K240-R.dat 
Senn-HD480-L.dat 
Senn-HD4 8 0-R.dat 
RS -Nova3 8 - L . dat 
RS-Nova38-R.dat 
Sony-TwinTurbo-L.dat 
Sony-TwinTurbo-R . dat 



Sony headphone impulse response 



Radio Shack headphone impulse response 



Sennheiser headphone impulse response 



Table 3: Contents of " "headphone s+spkr 



• * directory 



The 512 point impulse responses and speaker and headphone data may be 
found in the tar archive * * full . tar . 2 ' 1 . 

Compact data files 

For those interested purely in 3-D audio synthesis, we have included a 
data-reduced set of 128 point symmetrical HRTFs derived from the left 
ear ICE MAR responses . These have also been equalized to compensate for 
the non-uniform response of the Optimus Pro 7 speaker. The 128 point 
responses may be found in the tar archive ** "compact . tar . Z . The 
data-reduced impulse responses are stored in directories by elevation 
as described above. Within each directory each filename has the 
format HEEe AAAa . da t ' ' where EE is the elevation angle of the source 
in degrees, and AAA is the azimuth angle of the source in degrees. 

Each file contains a stereo pair of 128 point impulse responses 
corresponding to the left and right ear responses for the given source 
position. For instance, the file "H0e090a . dat ' 1 contains the left 
and right ear impulse responses for a source directly to the right of 
the listener. The left response was derived from the 512 point file 
* w LOe090a . dat * ' and the right respoxise was derived from the 512 point 
file * "L0e270a. dat ' * . The data is stored as 16-bit integers and the 
stereo samples are stored in (left, right) interleaved order. Each 
12 8 point response was obtained by convolving the appropriate 512 
point impulse responses with the minimum phase inverse filter for the 
Optimus Pro 7 speaker. The resulting impulse responses were then 
cropped by retaining 128 samples starting at sample index 26. The 
maximum sample value in the 128 point data is 30496 in the file 
"H-lOelOOa.daf ' . 

Accessing the data on the Internet 

The data is organized into two tar archives, this document (postscript 
and plain text) and a text README file. The structure of the tar 
archives is described in the previous sections. 

To retrieve the HRTF data by anonymous FTP, your FTP session would 
look something like the following: 

kdm(3)eno : ~ > ftp sound.media.mit.edu 
Connected to sound.media.mit.edu. 

22 0 sound.media.mit.edu FTP server (ULTRIX Version 4.1 Tue Mar 19 00:38:17 EST 1991 
Name (sound . media . mi t . edu : kdm) : anonymous 
331 Guest login ok, send ident as password. 
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Password: {Type your User ID here} 

230 Guest login ok, access restrictions apply. 

ftp> cd pub 

2 50 CWD command successful. 
ftp> cd Data 

250 CWD command successful. 
ftp> Cd KEMAR 

250 CWD command successful. 
ftp> Is 

200 PORT command successful. 

150 Opening data connection for /bin/Is (18.85.0.105,3975) (0 bytes). 
README 

compact .tar .2 

full .tar.Z 

hrtf doc .ps 

hrtf doc . txt 

226 Transfer complete. 

60 bytes received in 0.42 seconds (0.14 Kbytes/s) 

ftp> binary 

200 Type set to I. 

ftp> get README 

200 PORT command successful. 

150 Opening data connection for README (18.85.0.105,3806) (417 bytes). 
226 Transfer complete, 
local: README remote: README 

952 bytes received in 0.043 seconds (22 Kbyte9/s) 



Please note that there are no files shared between the two tar archive 
files. To expand the tar archives, use: 



This will create the directories ""full 1 ' and * "compact '* . 

To retrieve the HRTF data via the WWW, use your browser to open the 
following URL : 



Simply follow the directions found in the html document. 
Usage restrictions 

This HRTF data is Copyright 1994 by the MIT Media Lab. It is provided 
without any usage restrictions. We request that you cite the authors 
when using this data for research or commercial applications. 

Correspondence 

All correspondence regarding this data should be directed to: 

Keith Martin Bill Gardner 

MIT Media Lab, E15-401D MIT Media Lab, E15-401B 

20 Ames Street or 20 Ames Street 
Cambridge, MA 0213 9 Cambridge, MA 02139 



etc . 



kdmoeno ; 
kdmOeno : 
kdm@eno : 
kdm@eno : 



> uncompress full. tar. 2 

> tar xvf full. tar 

> uncompress compact . tar . Z 

> tar xvf compact. tar 



http : / /sound .media . mit . edu /KEMAR . html 
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